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Abstract 

We extend the Langevin Monte Carlo (LMC) algorithm to eompaetly supported measures 
via a projeetion step, akin to projeeted Stoehastie Gradient Deseent (SGD). We show that 
(projeeted) LMC allows to sample in polynomial time from a log-eoneave distribution with 
smooth potential. This gives a new Markov ehain to sample from a log-eoneave distribution. 
Our main result shows in partieular that when the target distribution is uniform, LMC mixes in 
O(n^) steps (where n is the dimension). We also provide preliminary experimental evidenee 
that LMC performs at least as well as hit-and-run, for whieh a better mixing time of O(n^) 
was proved by Lovasz and Vempala. 


1 Introduction 

Let K C M” be a convex body such that 0 G K, K contains a Euclidean ball of radius r, and K 
is contained in a Euclidean ball of radius R. Denote Vk for the Euclidean projection on K. Eet 
/ : 7L —)■ M be a L-Eipschitz and /^-smooth convex function, that is / is differentiable and statisfies 
yx,y G K, |V/(a;) — V/(t/)| < l3\x — y\, and |V/(a;)| < L. We are interested in the problem 
of sampling from the probability measure y on whose density with respect to the Eebesgue 
measure is given by: 

^ = ^exp{-f{x))l{x e K}, where Z = [ exp{-f{y))dy. 
ax Zj Jy£K 

In this paper we study the following Markov chain, which depends on a parameter y > 0, and 
where .^i, .^ 2 , • • • is an i.i.d. sequence of standard Gaussian random variables in M”: 

= Vk (x, - |v/(Xfc) + , (1) 


with Xq = 0. 
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Recall that the total variation distance between two measures /r, z/ is defined as TV(/i, u) = 
sup^ \fi{A) — z^(A)| where the supremum is over all measurable sets A. With a slight abuse of 
notation we sometimes write TV (X, u) where X is a random variable distributed according to 
p. The notation Vn = 0{un) (respectively fl) means that there exists c G M, (7 > 0 such that 
Vn < Cun\og‘^{un) (respectively >). We also say Vn = 0(un) if one has both Vn = 0(un) and 
Vn = ^(un)- Our main result is the following: 

Theorem 1 Assume that r = 1 and let e: > 0. Then one has TV{XN,fi) < e provided that 
r] = Q{R‘^/N) and that N satisfies the following: if p is uniform then 


N 



5 


and otherwise 


/ max(n, RL, \ 

V ;■ 


1.1 Context and related works 

There is a long line of works in theoretical computer science proving results similar to Theorem 
1, starting with the breakthrough result of Dyer et al. [1991] who showed that the lattice walk 
mixes in steps. The current record for the mixing time is obtained by Lovasz and Vem- 

pala [2007], who show a bound of 0{rA) for the hit-and-run walk. These chains (as well as other 
popular chains such as the ball walk or the Dikin walk, see e.g. Kannan and Narayanan [2012] 
and references therein) all require a zeroth-order oracle for the potential /, that is given x one can 
calculate the value f{x). On the other hand our proposed chain (1) works with di first-order oracle, 
that is given x one can calculate the value of V f{x). The difference between zeroth-order oracle 
and first-order oracle has been extensively studied in the optimization literature (e.g., Nemirovski 
and Yudin [1983]), but it has been largely ignored in the literature on polynomial-time sampling 
algorithms. We also note that hit-and-run and LMC are the only chains which are rapidly mixing 
from any starting point (see Lovasz and Vempala [2006]), though they have this property for seem¬ 
ingly very different reasons. When initialized in a comer of the convex body, hit-and-run might 
take a long time to take a step, but once it moves it escapes very far (while a chain such as the ball 
walk would only do a small step). On the other hand LMC keeps moving at every step, even when 
initialized in a corner, thanks for the projection part of ( 1 ). 

Our main motivation to study the chain (1) stems from its connection with the ubiquitous 
stochastic gradient descent (SGD) algorithm. In general this algorithm takes the form Xk+i = 
'Pk (xk — v'P fixk) + £k) where £ 1 , 62 , ■ ■. is a centered i.i.d. sequence. Standard results in ap¬ 
proximation theory, such as Robbins and Monro [1951], show that if the variance of the noise 
Var(£i) is of smaller order than the step-size p then the iterates (xk) converge to the minimum 
of / on X (for a step-size decreasing sufficiently fast as a function of the number of iterations). 
For the specific noise sequence that we study in (1), the variance is exactly equal to the step-size, 
which is why the chain deviates from its standard and well-understood behavior. We also note 
that other regimes where SGD does not converge to the minimum of / have been studied in the 
optimization literature, such as the constant step-size case investigated in Pflug [1986], Bach and 
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Moulines [2013]. 


The chain (1) is also closely related to a line of works in Bayesian statistics on Langevin Monte 
Carlo algorithms, starting essentially with Tweedie and Roberts [1996]. The focus there is on the 
unconstrained case, that is In this simpler situation, a variant of Theorem 1 was proven 

in the recent paper Dalalyan [2014]. The latter result is the starting point of our work. A straight¬ 
forward way to extend the analysis of Dalalyan to the constrained case is to run the unconstrained 
chain with an additional potential that diverges quickly as the distance from x io K increases. 
However it seems much more natural to study directly the chain (1). Unfortunately the techniques 
used in Dalalyan [2014] cannot deal with the singularities in the diffusion process which are in¬ 
troduced by the projection. As we explain in Section 1.2 our main contribution is to develop the 
appropriate machinery to study (1). 

In the machine learning literature it was recently observed that Langevin Monte Carlo algo¬ 
rithms are particularly well-suited for large-scale applications because of the close connection to 
SGD. For instance Welling and Teh [201 1] suggest to use mini-batch to compute approximate gra¬ 
dients instead of exact gradients in (1), and they call the resulting algorithm SOLD (Stochastic 
Gradient Langevin Dynamics). It is conceivable that the techniques developed in this paper could 
be used to analyze SGLD and its refinements introduced in Ahn et al. [2012]. We leave this as 
an open problem for future work. Another interesting direction for future work is to improve the 
polynomial dependency on the dimension and the inverse accuracy in Theorem 1 (our main goal 
here was to provide the simplest polynomial-time analysis). 

1.2 Contribution and paper organization 

As we pointed out above, Dalalyan [2014] proves the equivalent of Theorem 1 in the unconstrained 
case. His elegant approach is based on viewing LMC as a discretization of the diffusion process 
dXt = dWt — f{Xt), where [Wt] is a Brownian motion. The analysis then proceeds in two 
steps, by deriving first the mixing time of the diffusion process, and then showing that the dis¬ 
cretized process is ‘close’ to its continuous version. In Dalalyan [2014] the first step is particularly 
clean as he assumes a-strong convexity for the potential, which in turns directly gives a mixing 
time of order 1/a. The second step is also rather simple once one realizes that LMC can be viewed 
as the diffusion process dXt = dWt — |V/(X^|^tj). Using Pinsker’s inequality and Girsanov’s 

formula it is then a short calculation to show that the total variation distance between Xt and Xt is 
small. 

The constrained case presents several challenges, arising from the reflection of the diffusion 
process on the boundary of K, and from the lack of curvature in the potential (indeed the con¬ 
stant potential case is particularly important for us as it corresponds to /r being the uniform dis¬ 
tribution on K). Rather than a simple Brownian motion with drift, LMC with projection can be 
viewed as the discretization of reflected Brownian motion with drift, which is a process of the 
form dXt = dWt — f{Xt)dt — i'tL{dt), where Xt G iT, Vf > 0, L is a measure supported 
on {f > 0 : Xt G dK}, and Ut is an outer normal unit vector of K dX Xt. The term VtL{dt) is 
referred to as the Tanaka drift. Following Dalalyan [2014] the analysis is again decomposed in two 
steps. We study the mixing time of the continuous process via a simple coupling argument, which 
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crucially uses the eonvexity of K and of the potential /. The main diffieulty is in showing that the 
diseretized proeess (Xt) is elose to the eontinuous version (Xt), as the Tanaka drift prevents us 
from a straightforward applieation of Girsanov’s formula. Our approaeh around this issue is to first 
use a geometrie argument to prove that the two proeesses are elose in Wasserstein distanee, and 
then to show that in faet for a refleeted Brownian motion with drift one ean deduee a total variation 
bound from a Wasserstein bound. 

The paper is organized as follows. We start in Seetion 2 by proving Theorem 1 for the ease 
of a uniform distribution. We first remind the reader of Tanaka’s eonstruetion (Tanaka [1979]) of 
refleeted Brownian motion in Subseetion 2.1. We present our geometrie argument to bound the 
Wasserstein distanee between (Xt) and (Xt) in Subseetion 2.2, and we use our eoupling argument 
to bound the mixing time of (Xt) in Subseetion 2.3. Then in Subseetion 2.4 we use properties of 
refleeted Brownian to show that one ean obtain a total variation bound from the Wasserstein bound 
of Subseetion 2.2. We eonelude the proof of the first part of Theorem 1 in Subseetion 2.5. In 
Seetion 3 we generalize these arguments to an arbitrary smooth potential. Finally we eonelude the 
paper in Seetion 4 with some preliminary experimental eomparison between LMC and hit-and-run. 

2 The constant potential case 

In this seetion we prove Theorem 1 for the ease where /i is uniform, that is V/ = 0. First we 
introduee some useful notation. For a point x G dK we say that v is an outer unit normal veetor at 
X if \h>\ = 1 and 

(x — x', z/) > 0, Vx' e K. 

For X ^ dK we say that 0 is an outer unit normal at x. Let || ■ ||x be the gauge of K defined by 

llxjlii- = inf{f > 0; X G tK}, x G M"", 
and hx the support funetion of K by 

= sup{(x,|/); X G iT}, y eW. 

Note that hx is also the gauge funetion of the polar body of K. Finally we denote m = f \x\y(dx), 
and M = E [H^H/f], where 9 is uniform on the sphere 

2.1 The Skorokhod problem 

Let T G M+ U {-fee} and w: [0,T) — )■ M" be a pieeewise eontinuous path with r(;(0) G K. 
We say that x: [0, T) —>■ M" and ^p: [0, T) —)■ solve the Skorokhod problem for w if one has 
x(t) eK,yte [0,T), 

x(t) = w(t) + Lp(t), VfG[0,T), 

and furthermore p) is of the form 
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where Ug is an outer unit normal at a:(s), and L is a measure on [0, T] supported on the set {t G 
[0,T) : x{t) e OK}. 

The path x is ealled the reflection of w at the boundary of K, and the measure L is ealled the 
local time of x at the boundary of K. Skorokhod showed the existenee of sueh a a pair (x, 
in dimension 1 in Skorokhod [1961], and Tanaka extended this result to eonvex sets in higher 
dimensions in Tanaka [1979]. Furthermore Tanaka also showed that the solution is unique, and if 
w is eontinuous then so is x and (p. In partieular the refieeted Brownian motion in K, denoted (Xt), 
is defined as the refleetion of the standard Brownian motion (Wt) at the boundary of K (existenee 
follows by eontinuity of Wfl. Observe that by Ito’s formula, for any smooth funetion g on M", 

g{Xt)-g{Xo)= [\vg{Xg),dWg) + l fAg{Xg)ds- [\vg{Xg),iyg) L{ds). (2) 
Jo ^ Jo Jo 

To get a sense of what a solution typieally looks like, let us work out the ease where w is 
pieeewise eonstant (this will also be useful to realize that LMC ean be viewed as the solution to a 
Skorokhod problem). For a sequenee gi ■ ■ ■ gN ^ and for g > 0,'we eonsider the path: 

N 

w{t) = '^gkl{t> kg}, t e[0,{N + l)g). 

k=l 

Define {xk)k=o,...,N induetively by xq = 0 and 

^k+l T 9k')- 


It is easy to verify that the solution to the Skorokhod problem for w is given by x(f) 
p>{t) = — /q Us L{ds), where the measure L is defined by (denoting 5s for a dirae at s) 

N 

L ^ ^ \xk T gk T* xi^-t^k T 9k) 

k=l 


and for s = kg. 


•ttk T 9k 'PKi^tCk T 9k) 
\xk + 9k- Vxixk + 9k)\' 


Xj^tj and 


2.2 Discretization of reflected Brownian motion 


Given the diseussion above, it is elear that when / is a eonstant funetion, the ehain (1) ean be 
viewed as the refleetion (Xt) of a diseretized Brownian motion Wt ■= at the boundary of 

K (more preeisely the value of Xkrj eoineides with the value of as defined by (1)). It is rather 
elear that the diseretized Brownian motion {Wfl is “elose” to the path (Wt), and we would like to 
earry this to the refieeted paths (Xt) and (Xt). The following lemma extraeted from Tanaka [1979] 
allows to do exaetly that. 


Lemma 1 Let w and w be piecewise continuous path and assume that (x, ip) and (x, ip) solve the 
Skorokhod problems for w and w, respectively. Then for all time t we have 

\x{t) — x(f)|^ < |te(f) —w{t)\'^ 

+ 2 / {w{t)—w{t) — w{s)+w{s),ip{ds)— ^{ds)). 
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In the next lemma we eontrol the loeal time at the boundary of the refleeted Brownian motion (XJ. 


Lemma 2 We have, for allt > 0 


E 


hxi^s) L{ds) 


uo 



Proof By ltd’s formula 


d\Xt\^ = 2{Xt, dWt) +ndt- 2{Xt, vt) L{df). 

Now observe that by definition of the refleetion, if t is in the support of L then 

Wx e K. 

In other words (X*, z/j) > hK{vt). Therefore 

2 [ hK{iys)L{ds)<2 [ {Xs,dWs)+nt+\Xo\^-\Xt\f 

Jo Jo 

The first term of the right-hand side is a martingale, so using that Xq = 0 and taking expeetation 
we get the result. ■ 


Lemma 3 There exists a universal constant C such that 


E 


sup ||lTt - Wt\\K 

[0,T] 


<CM log(T/r7)^'^^. 


Proof Note that 


E 

sup \\Wt - Wtllx 

= E 

max Yi 


jO,T] 


0<i<N-l 


where 

y,= sup \\Wt-W,rj\\K. 

t£[iri,{i+l)r]) 

Observe that the variables (1^) are identieally distributed, let p > 1 and write 





/N-1 \ 

E 

max Yi 

i<N-l 

< E 





\i=0 / 




We elaim that 

\\Yo\\p<C./^M (3) 

for some eonstant C, and for all p > 2. Taking this for granted and ehoosing p = log(X) in the 
previous inequality yields the result (reeall that N = T/p). So it is enough to prove (3). Observe 
that sinee (Wt) is a martingale, the proeess 


Mt = \\Wt\\K 
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is a sub-martingale. By Doob’s maximal inequality 


IlFollp = II supMjp < 2||M^||p, 


for every p > 2. Letting 7^ be the standard Gaussian measure on and using Khintebin’s 
inequality we get 


ll^r,||p = VV 


\x\\^j^'^nidx] 


1/p 


<C^/pp ||a;||i^ 7n((ia;) 


Lastly, integrating in polar eoordinate, it is easily seen that 



ln{,dx) < C\/nM. 


Henee the result. ■ 

We are now in a position to bound the average distanee between Xt and its diseretization Xt- 

Proposition 1 There exists a universal constant C such that for any T >t) we have 

E[\Xt-Xt\] < C'(7log(T/7))^/^n^/^T^/2M^/2 

Proof Applying Lemma 1 to the proeesses [Wt] and {Wt) at time T = Np yields (note that 
Wx = hL x) 

{Wt - Wt, ut)L{dt) -2 [ {Wt- Wt, ut)L{dt) 

Jo 

We elaim that the seeond integral is equal to 0. Indeed, sinee the diseretized proeess is eonstant on 
the intervals [kp, {k + 1)7) the loeal time L is a positive combination of Dirac point masses at 


\Xx-Xx\^ < 2 


7 , 27 ,..., Np. 

On the other hand Wkr, = for all integer k, hence the claim. Therefore 


|At - Xrl' <2 {Wt- Wt, Pt) L{dt) 


Using the inequality {x,y) < \\x\\KhK{y) get 


\Xx-Xx? <2snp\\Wt-Wx\\K / hK{Pt)L{dt). 
[o,r] Jo 

Taking the square root, expectation and using Cauchy-Schwarz we get 

E [\Xx-Xx\f < 2E 

Applying Lemma 2 and Lemma 3, we get the result. 




r r 1 

sup \\Wt - WxWk 

jO.T] 

E 

/ hK{Pt)L{dt) 

Jo 
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2.3 A mixing time estimate for the reflected Brownian motion 

The reflected Brownian motion is a Markov process. We let (Pt) be the associated semi-group: 

PJ{x)=E,[f{Xt)], 


for every test function /, where means conditional expectation given Xq = x. Ito’s formula 
shows that the generator of the semigroup (Pt) is (1/2)A with Neumann boundary condition. 
Then by Stokes’ formula, it is easily seen that fx (the uniform measure on K normalized to be a 
probability measure) is the stationary measure of this process, and is even reversible. In this section 
we estimate the total variation between the law of (Xt) and fx. 

Given a probability measure u supported on K, we let uPt be the law of Xt when Xq as law u. 
The following lemma is the key result to estimate the mixing time of the process (Xt). 


Lemma 4 Let x,x' e K 


TV{S,Pt,S,>Pt) < 


X — x' 
\/2ni 


Proof Let {Wt) be a Brownian motion starting from 0 and let {Xt) be a reflected Brownian motion 
starting from x: 

\ Xq = X 

\ dXt = dWt-ixtL{dt) ^ ^ 

where (ut) and L satisfy the appropriate conditions. We construct a reflected Brownian motion 
{X't) starting from x' as follows. Let 


T = inf{t > 0; Xt = X't}, 

and for f < r let St be the orthogonal reflection with respect to the hyperplane (Xt — X()-*-. Then 
up to time r, the process (X() is defined by 

{ X', = x' 

{ dX't = dWi-ix'tL'(dt) (5) 

\ dWl = St{dWt) 

where L' is a measure supported on 


{t < r; X't e dK} 

and u't is an outer unit normal at X( for all such t. After time r we just set X( = Xt. Since St is an 
orthogonal map (Wl) is a Brownian motion and thus (X() is a reflected Brownian motion starting 
from x'. Therefore 

TV(4Pt,5.'Pi) < P(Xt ^ X't) = P(r > t). 

Observe that on [0, r) 

dWt - dWl = (I - St)idWt) = 2{Vt, dWt)Vt, 


where 




Xt-X) 

\X, - Xi 


8 



So 


d{Xt - X') = 2{Vt, dWt)Vt - Vt L{dt) + u' L'{dt) 

= 2{dBt) Vt - Vt L{dt) + v[ L'{dt), 

where ^ 

Bt= [ {Vs,dWs), on[0,r). 

Jo 

Observe that [Bt] is a one-dimensional Brownian motion. Ito’s formula then gives 

dg{Xt - X[) = 2{Vg{Xt - X[), V) dBt - {Vg{Xt - X'), vt) L{dt) 

+ {Xg{Xt - XO, vt) L'{dt) + 2V‘^g{Xt - X[){Vt, V) dt, 


for every g which is smooth in a neighborhood of Xt — X). Now if g{x) 


XgiXt-X)) = Vt 


x\ then 


so 

{Xg{Xt-X)),Vt) = l 

{Xg{Xt — X[)^ Vt) > 0, on the support of L (6) 

{Xg{Xt — X[)^ v[) < 0, on the support of L'. 

Moreover 

v"g(A', - a;) = 

where P^i- denotes the orthogonal projection on x^. In particular 

X^g{Xt-Yt){Vt) = t). 


We obtain 

\Xt — X[\ < \x — x\+ 2Bt, on [0,r). 

Therefore 

P(r >t) < P(r' > t) 

where r' is the first time the Brownian motion (Bt) hits the value —|a;—a;'|/2. Now by the reflection 
principle 

P(r' > f) = 2P(0 < 2Bt < \x-x'\) < 

V 27rt 

Hence the result. ■ 

The above result clearly implies that for a probability measure v on K, 


TV{SoPt,vPt) < 


Ik I^I ^ W 

y/27lt 


Since g, is stationary, we obtain 

7Tl 

TV(«.,)<^ 


(7) 
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for any t > 0. In other words, starting from 0, the mixing time of (Xt) is of order at most m?. 
Notiee also that Lemma 4 allows to bound the mixing time from any starting point: for every 
X e K, we have 




R 



where R is the diameter of K. Letting Tmix be the mixing time of (Xt), namely the smallest time t 
for whieh 

sup{TV((5^Pt,/r)} < 

xGK € 


we obtain from the previous display t„ %ix < 2R^. Sinee for any X and t we have TX{5xPt, lA < 
g-h/T-mixJ (ggg g g ^ [Levin et ah, 2008, Lemma 4.12]) we obtain in partieular 


TV(5oPt,/i) 


The advantage of this upon (7) is the exponential deeay in t. On the other hand, sinee obviously 
m < R, inequality (7) ean be more preeise for a eertain range of t. The next proposition sums up 
the results of this seetion. 


Proposition 2 For any t>Q,we have 

TV((5oPt,/i) < C min j , 

where C is a universal constant. 


2.4 From Wasserstein distance to total variation 

In the following lemma, whieh is a variation on the refleetion prineiple, (ILt) is a Brownian motion, 
the notation Pa, means probability given Wq = x and (Qt) denotes the heat semigroup: 

QMx)=^x[h{Wt)], 


for every test funetion h. 

Lemma 5 Let x ^ K and let a be the first time iWt) hits the boundary ofK. Then for allt > 0 

IPx(cr <t) < 2'¥^{Wt i K) = 2Qt{tK<^){x). 

Proof Let (P)) be the natural filtration of the Brownian motion. Fix f > 0. By the strong Markov 
property 

VfiWtiK\P^)=u{a,W„), ( 8 ) 

where 

u{s,y) = l{s < t}¥y{Wt-s ^ K). 

Let y G dK, sinee K is eonvex it admits a supporting hyperplane H at y. Let H+ be the halfspaee 
delimited by H eontaining K. Then for any u > 0 

^y{W^ iK)> ¥y{Wu i H+) = 


10 



Equality (8) thus yields 


almost surely. Taking expeetation yields the result. ■ 

We also need the following elementary estimate for the heat semigroup. 

Lemma 6 For any s > 0 


/ Qs{lK^)dx<.fsV,^-^{dK), 

Jk 

where 'W~^{dK) is the Hausdorjf measure of the boundary of K. 

Proof Let ip{s) = Qsi^K'=) dx. Then by definition of the heat semigroup and Stokes’ formula 


^ / ^Qs{'^K-)dx =]- [ {VQs{tK-){x),u{x))'H^ ^{dx), 

^ JK ^ JdK 

for every s > 0 and where v{x) is an outer unit normal veetor at point x. On the other hand an 
elementary eomputation shows that for every s > 0 

|VQ.(li^c)| <s-V2, (9) 


pointwise. We thus obtain 




n^-\dK) 
2^ ' 


for every s > 0. Integrating this inequality between 0 and s yields the result. 


Proposition 3 Let T, S be integer multiples of rj. Then 


TY{Xt+s.Xt+s) < 


— X'j' 

71 


+ TV{Xt, /i) + iVsn^-^dK) \K\-\ 


Proof We use the eoupling by refleetion again. Fix x and x' in K. Let (Xt) and (X^) be two 
Brownian motions refleeted at the boundary of K starting from x and x' respeetively, sueh that 
the underlying Brownian motions (Wt) and (W^) are eoupled by refleetion, just as in the proof of 
Lemma 4. Let (X'J be the diseretization of (XJ, namely the solution of the Skorokhod problem 

for the proeess Let S' be a integer multiple of rj. Obviously, if (X*) and (XJ have 

merged before time S and in the meantime neither (XJ nor (XJ has hit the boundary of K then 


X5 = X^ = X'5. 


Therefore, letting r be the first time X* = X{ and a and a' be the first times (XJ and (XJ hit the 
boundary of K, respeetively, we have 

P(X5 ^ x's) < P(r > ^) + P(a < ^) + P(a' < S), (10) 
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As we have seen before, the eoupling time r satisfies 


On the other hand Lemma 5 gives 


P(r > ^) < 


X — x'\ 


P(cr < S) < 2Qs{1k‘^){x), 

and similarly for a'. Notice also that the estimate (9) implies that 

Qs{^K‘^)ix') < + -L—;=—!-. 

Plugging everything back into (10) yields 

P(X5 7^ Xs) < +^Qs{tK^){x). (11) 

Now let T and S be two integer multiples of rj and assume that {Xt) and (XJ start from 0 and are 
coupled using the same Brownian motion up to time T, and using the reflection coupling between 
time T and T + S. Then, by Markov property and (11) we get 

P(Xr+5 7^ Xt+s I Xt) < —^—~ 

Now we take expectation, and observe that by Lemma 6 

E[Qs{lK^){XT)]<TV{XT,fi)+ [ Qs{tK^)d^^ 

Jk 

< TV(XT,/i) + Vsn^-\dK) \K\-\ 

Putting everything together we get the result. ■ 


2.5 Proof of the main result 

Let S, T be integer multiples of rj. Writing 

TV(Xt+5,/x) < TV(Xt+s,Xt+s) + TV(XT+s,/r) 
and using Proposition 1 and Proposition 3 yields 

TV (Xt+s, log(T/r/))'/^ 5-V2 + 2 TV(Xt, /i) 

+ 4S^/^'H^-\dK) \K\-\ 

For sake of simplicity let us assume that K contains the Euclidean ball of radius 1, and let us aim 
at a result depending only on the diameter R of K. So we shall use the trivial estimates 

m < R, M < - < 1, 
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together with the less trivial but nevertheless true 


-H^-^idK) <n\K\. 

Next we use Proposition 2 to bound TV(X^, /x) and (12) beeomes 

TV(Xr+s,/i) < C {{7]\og{T+AnS^/^. 

Given a small positive eonstant e, we have to piek S', T, rj so that the right-hand side of the previous 
inequality equals £. So we need to take 

T^i?Mog(l/£), 

and to ehoose t ] so that 

1 log (^) «__ 

T \V J n'^ log{l / ey 

Sinee for small ( we have 

{iog(i/0«C « 

and assuming that R and l/e are at most polynomial in n, we obtain 

^ log(n)3 

To sum up: Let {^k) be a sequenee of i.i.d. standard Gaussian veetors, ehoose the value of r] given 
above and run the algorithm 

r Xo = 0 _ 

1 = Rk {Xk + 

for a number of steps equal to 

T + S R^iRloginf 

rj 

Then the total variation between X ^ and the uniform measure on K is at most e. 


3 The general case 

In the previous seetion we viewed LMC (for a eonstant funetion /) as a diseretization of refleeted 
Brownian motion (X^) defined by dXt = dWt — VtL{dt) and Xq = 0. In this seetion (X^) 
is a slightly more eomplieated proeess: it is a diffusion refleeted at the boundary of K. More 
speeifically (X^) 

Xt e X, Vf > 0 

dX* = dWt - ]^Vf{Xt)dt - UtL{dt), 
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where L is a measure supported on {t > 0 : E dK } and Ut is an outer unit normal at Xt for 

any sueh t. Reeall the definition of LMC (1), let us eouple it with the eontinuous proeess {Xt) as 
follows. Let {Yt) be a proeess eonstant on eaeh interval [^ 77 , {k + 1 ) 77 ) and satisfying 

X{k+l)r) = 'Pk (Ykrt + W^(fc+l)r? ” Wkrj — /{Ykr,)'^ , (14) 

for every integer k. The purpose of this seetion is to give a bound on the total variation between 
Xt and its diseretization Yt. 

3.1 Mixing time for the continuous process 

Sinee V/ is assumed to be globally Lipsehitz, the existenee of the re fleeted diffusion is insured 
by [Tanaka, 1979, Theorem 4.1]. Ito’s formula then shows that {Xt) is a Markov proeess whose 
generator is the operator L 

L;i=iAfc-i{V/.Vfc) 

with Neumann boundary eondition. Together with Stokes’ formula, one ean see that the measure 

fi{dx) = Z ^k{x) dx 

(where Z is the normalization eonstant) is the unique stationary measure of the proeess, and that it 
is even reversible. 

We first show that if / is eonvex the mixing time estimate of the previous seetion remains valid. 
Again given a probability measure u supported on K we let uPt be the law of Xt when Xq has law 

V. 

Lemma? Iff is convex then for every x,x' G K 


TV{6,Pt,6^>Pt) < 

yznt 

Proof As in the proof of Lemma 4, let {Xt) and {X).) be two refleeted diffusions starting from x 
and x' and sueh that the underlying Brownian motions are eoupled by refleetion. In addition to (6), 
one also has 

{Xg{Xt - X'), Xf{Xt) - v/(x;)) > 0, 

by eonvexity of /. The argument then goes through verbatim. ■ 

As in seetion 2.3, this lemma allows us to give the following bound on the mixing time of {Xfj. 

Proposition 4 For any t > 0 

TV{doPt, f) < C* min , 

where C is a universal constant. 
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3.2 A change of measure argument 

Again let (Xt) be the refleeted diffusion (13). Assume that (Xt) starts from 0 and let (Zt) be the 
process 

Zt = Wt- \ ! Vf{Xs)ds. (15) 

^ Jo 

Observe that (Xt) solves the Skorokhod problem for (Zt). Following the same steps as in the 
previous section we let 

Zt Zlt/itlrj 

and we let (Xt) be the solution of the Skorokhod problem for (Zt). In other words (Xt) is constant 
on intervals of the form [kr], (k + l)r]) and for every integer k 

Z.(k+l)ri = Vk {Xkri + Z(^k+l)ri “ Zkn) , (16) 

Clearly (Xt) and (Yt) are different processes (well, unless the potential / is constant). However, 
we show in this subsection that using a change of measure trick similar to the one used in Dalalyan 
[2014], it is possible to bound the total variation distance between Xt and Yt. Recall first the 
hypothesis made on the potential / 

|V/(a;)| <L, \Vf(x)-Xf(y)\<fJ\x-y\, Wx,y e K. 


Lemma 8 Let T be an integer multiple ofrj. Then 


TV(Xt, Yt) < 


2 



Xs \ ds 


1/2 


Proof Write T = krj. Given a continuous path (wt)t<krt we define a map Q from the space of 
sample paths to M by setting Q(w) = Xk where (xt) is defined inductively as 


xo = 0 

Xi+i = Vk (^Xi + W(i+i)r, - Win - 5 i<k-l. 

Observe that with this notation we have Ykn = Q((Wt)t<kr])- On the other hand, letting (ut) be the 
process 

«■< = i (v/(y,) - V/(X,)), 

letting Wt = Wt + J* Ug ds and using equation (16), it is easily seen that 

Xkn = Q [(Wt)t<kv) ■ 


This yields the following inequality for the relative entropy of Xkn with respect to Ykn' 

ii(Xkn I Ykn) < H [(Wt)t<kr, \ (Wt)t<krt) • (17) 
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Since ly is a Brownian motion plus a drift (observe that the proeess (ut) is adapted to the natural 
filtration of (Wt)) it follows form Girsanov’s formula, see for instanee Proposition 1 in Lehee 
[2013], that 


I < -E 

= -E 


rkr] 


\ut\‘^dt 


IJo 


f*kri 


\Vf{X,)-Vf{X,)\^dt 


uo 


Plugging this baek in (17) and using the hypothesis made on / we get 


I n,) < 


^krj 


\Xt-Xt\dt 


L-'o 


We eonelude by Pinsker’s inequality. ■ 

The purpose of the next two subseetions is to estimate the transportation and total variation dis- 
tanees between Xt and Xt. 


3.3 Estimation of the Wasserstein distance 


First we extend Lemma 2 and Lemma 3 to the general ease. 


Lemma 9 We have, for allt > Q 


E 


hKii^s) L{ds) 


uo 


< 


(n + RL)t 


Proof As in the proof of Lemma 2, Ito’s formula yields 

(X„ dW,) - [\x,, V/(X,)) ds + nt+ |Xop - |Xi|2. 

Jo 

Assume that Xo = 0, note that the first term is a martingale and observe that |(Xs, V/(Xs)) | < RL 
by hypothesis. Taking expeetation in the previous display, we get the result. ■ 

Reeall the definition of the proeess {Zt): 



Zt = Wt-^j\f{X,)ds, 

and reeall that (Zt) is its diseretization: Zt = Zrj\t/n\ ■ 

Lemma 10 There exists a universal constant C such that 


E 


sup \\Zs - Zs\\k 

m 


< C\og{t/riY^‘^ + 
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Proof Since for every x G M" 


l|V/(x)|k < ; |V/(x)| < 

we have 

_ _ 1 /•* 

\\Zt-Zt\\K<\\Wt-Wt\\K + ^ / \\VfiXt)\\Kdt 

^ dlt/v\v 

2r 

for every f > 0. Together with Lemma 3, we get the result. ■ 

As in section 2.2, combining these two lemmas together yields the following estimate. 

Proposition 5 For every time T, we have 

E - Xrl] < C (C, (r/log(T/77))'/"TV2 + , 

where C is a universal constant and where 

Cl = Ci{K, f) = 

= C,{K, f ) = 

3.4 From Wasserstein distance to total variation 

Unless / is constant, the diffusion (Z^) does not satisfy Lemma 5 so we need to proceed somewhat 
differently from what was done in seetion 2.4. We start with a simple lemma showing that /r does 
not put too mueh mass elose to the boundary of K. 

Lemma 11 Let 7 > 0. One has 

/i({x e K,d(x,dK) < 7}) < + 

r 

Proof Define 

:= {x G K] d(x,dK) > 7}. 

Let B" be the Euelidean ball, sinee K eontains rB” and is eonvex we have 

fl - -W+ ^rB” C K, 

\ r J r 

henee 

(1 - 2) A- C K., 

Clearly this implies: 

f dx>(l-^yf dy. 
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Since / is Lipschitz with constant L one also has 


/ ((1 - 7/r)y) < f{y) - < f{y) - 

for every y E K. Combining the last two displays, we obtain 




ex]){—f{x))dx > j e J e dx 


> ( 1 _ ^ _ RLi \ 
r r ) 


dx, 


IK 


which is the result. ■ 

Here is a simple bound on the speed of a Brownian motion with drift. 


Lemma 12 Let (Wt) be a standard Brownian motion (starting from Oj, let (vt) an adapted drift 
satisfying < L (almost surely), and (Zt) the process given by 

rt 


Zt — Wt + / Vgds. 


Then for every t > 0 and every 7 > 0 


P I sup \Zs\ > 7 1 < 

se[o,t] 


+ Lt 


7 


Proof By the triangle inequality and since \vt\ < L, we have 

\Zs\ < \Ws\ + Ls, 

for any s. Now the process (|hCs| + Ls) is non-negative submartingale so by Doob’s maximal 
inequality 

¥.[\Wt\+Lt] 


P I sup > 7 1 < 

se[o,t] 


7 


Since E[|H4|] < we get the result. 


Proposition 6 Let T and S be integer multiples ofp. We have 

TV(Xt+ 5 , Xt+s) < c {W{T)S-^R + TV(Xt, p) + Cg + ^4 + Cs WiTf^) , 

where C is a universal constant, W (T) is the bound obtained in Proposition 5 and 


C, = nV4i?V2^-l/2^1/2 ^ ^3/4^-1/2 
C'5 = i?V2^-V2i:i/2 + ^i/2^-i/2_ 
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Proof The proof follows similar lines to those of the proof of Proposition 3, but the drift term 
requires some additional bounds whieh will be provided by the previous two lemmas. 

We begin with fixing two points x,x' G K and we eonsider the two assoeiated diffusions 
proeesses (Xt) and {X[), whieh start from the points x and x' respeetively, sueh that the underlying 
Brownian motions are eoupled by refieetion. In other words, those proeesses satisfy equations (4) 
and (5) with the additional drift term. 

In analogy with the proeess (Zt), let {Z[) be the proeess 

let z[ = and let {x[) be the solution of the Skorokhod problem for {z[). We proeeed as in 

the proof of Proposition 3, letting r be the eoupling time of (Xt) and (X/) and letting a and a' be 
the first time (Xt) and (X^) hit the boundary of K, we have that 

P(Xs ^ X's) < P(r > ^) + P(cr < ^) + P(a' < S). 


Moreover the eoupling time r still satisfies 


{t>S)< 


\x — X 


■ 

Now fix 7 > 0 and observe that if d{x, dK) > 7, then a is at least the first time the proeess 


hits the sphere eentered at x of radius 7. So, by Lemma 12, 

P(*^ ^ ^ ^ T 1 {d(x,9J<')<7}• 

There is a similar inequality for a' and we obtain 

2^/WS + 2LS 


P(X 5 ^ x'^) < ^ 


< 




\x — X 




+ 


7 


+ 1 


{d{x,dK)<'y} T ^ {d{x',dK)<'y} 


2VWS + 2LS 


7 


P 2 1 {(i(a;,9ir)<27} P ^fl^:—2:'|)>7} • 


So if T and S are two integer multiples of 7 , if (X^) and (X^) start from 0, are eoupled using the 
same Brownian motion up to time T, and using the refieetion eoupling between time T and T + S, 
then we have 


F{Xt+s ^ Xt+s) < ^ P + + 2 P {d{XT, OK) < 27 ) 




7 


+ P(|Xt-Xt| >7)- 
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By Lemma 11 , 


P (d(Xr, dK) < 27) < /i {d{x, dK) < 27) + TV(Xr, /x) 

2 {RL + n)7 


< 


TV(XT,/i), 


and an application of Markov’s inequality gives 


¥{\Xt-Xt\ > 7 ) < 


E[\Xt-Xt\] 


7 


Combining the last three displays together, we finally obtain 




7 


+ 2TV(Xr,/i) + 


E[\Xt-Xt\] 


7 


Optimizing over 7 and using Proposition 5 yields the desired inequality. 


3.5 Proof of Theorem 1 

This subsection contains straightforward calculations to help the reader put together the results 
proven above. Hereafter, to simplify notation, the constants c, C will represent positive universal 
constants whose value may change between different appearances. 

Let T and S be integer multiples of 7 and write 

TV(Ft+ 5,/^) < TV(yT+S,Xr+5) + TV(Zr+5,XT+5) + TV(XT+5,/i). 

Again, we will not try to give an optimal result in terms of all the parameters. So assume for 
simplicity that K contains the Euclidean ball of radius 1 so that r is replaced by 1 in constants 
C 2 , C 3 , C 4 and C^. Also let 

n* = max(n, RL, R/3). 

Keeping in mind that S shall be chosen to be rather small (hence assuming S < 1 ), Proposition 6 
is easily seen to imply that 

^ TV(Xt+s, Xt+s) < W{T)S-^/^ + TV(Xr, /i) + n* + (n* W{T)Y/\ 

Together with Lemma 8 and Proposition 4 we get 

^TV(Ft+s,/w) < {L^T + + W{T)S-^/^ + 

(_y 

Fix £ > 0 and choose 

S = n:^e\ T = RHog{l/e). 
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Then it is easy to see that it is enough to pick r] small enough so that 


W{T) < Cn-'^ e^\og{l/e)-^, 
to ensure TV(X'r+s, /i) < Ce. Now Proposition 5 clearly yields 

W{T) < Cn, {rj\og{T/r])Y/^T^/^. 

Recall that T = log(l/£) and observe that 

<c—— 

^ R‘^ max(log(n), log(-R), log(l/e))'^ 

suits our purpose. Lastly for this choice of r] the number of steps in the algorithm is 

^ ^ r + ^ ^ R^ max(log(n), log(i?), log(l/£))^ 
rj ~ £^2 

4 Experiments 

Comparing different Markov Chain Monte Carlo algorithms is a challenging problem in and of 
itself. Here we choose the following simple comparison procedure based on the volume algorithm 
developed in Cousins and Vempala [2014]. This algorithm, whose objective is to compute the 
volume of a given convex set K, precedes in phases. In each phase ^ it estimates the mean of 
a certain function under a multivariate Gaussian restricted to K with (unrestricted) covariance 
adn- Cousins and Vempala provide a Matlab implementation of the entire algorithm, where in 
each phase the target mean is estimated by sampling from the truncated Gaussian using the hit- 
and-run (H&R) chain. We implemented the same procedure with LMC instead of H&R, and we 
choose the step-size rj = l/(/3n^), where /3 is the smoothness parameter of the underlying log- 
concave distribution (in particular here [3 = l/aj). The intuition for the choice of the step-size 
is as follows: the scaling in inverse smoothness comes from the optimization literature, while the 
scaling in inverse dimension squared comes from the analysis in the unconstrained case in Dalalyan 
[2014]. 
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Estimated normalized volume Time 




We ran the volume algorithm with both H&R and LMC on the following set of eonvex bodies: 
K = [—1,1]"^ (referred to as the “Box”) and K = [—1,1]" n (referred to as the “Box 

and Ball”), where n = 10 x k,k = 1,... ,10. The eomputed volume (normalized by 2” for the 
“Box” and by 0.2 x 2” for the “Box and Ball”) as well as the eloek time (in seeonds) to terminate 
are reported in the figure above. From these experiments it seems that LMC and H&R roughly 
eompute similar values for the volume (with H&R being slightly more aeeurate), and LMC is 
almost always a bit faster. These results are eneouraging, but mueh more extensive experiments 
are needed to deeide if LMC is indeed a eompetitor to H&R in praetiee. 
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